Capstone Project¶

Horse Racing and Odds Fluctuations¶

by Billy Bingham¶

To-Do List¶

  • [x] Compile remainder of data
  • [x] Compile complete data dictionary
  • [x] Check all values correct - misspellings etc
  • [x] Break down tasks
  • [x] Compile Python Markdown with To-Do list
  • [x] Work out what to do with NULL values (NULL values have been deleted as they will not help analysis)
  • [x] Import data csv into Python
  • [x] Go through Rubric/Slides to know exactly what is needed
  • [x] Tableau Dashboard
  • [x] Impute Yes/No columns as 1's/0's if necessary
  • [x] Research how to do nicer graphics in Tableau like ones on Tableau Public
  • [x] Attempt a nicer Tableau visual - if it looks good then possible presentation method? Scrollable graphic?
  • [x] Decide on presentation method
  • [x] Research best model to use for my specific analysis
  • [x] Build model on this workbook
  • [x] Test model
  • [x] Build any final charts in Tableau
  • [x] Research report writing styles
  • [x] Report writing (explain my approach)
  • [x] Report writing (strengths and weaknesses in the process)
  • [x] Report writing (write up summary against success metrics)
  • [x] Proofread report, check it against the rubric to make sure everything covered
  • [x] Make the presentation
  • [x] Rehearse presentation, time it to make sure it's not too long/short
  • [ ] Extra: If I have time, try to webscrape punters.com.au to get race times & distances

Import Pandas library, read csv file required for analysis and modelling.¶

In [7]:
import pandas as pd

data = pd.read_csv('~/Downloads/racing.csv')

data.head()
Out[7]:
RaceID Date Meet State RaceNo Winner Opening Starting Fluctuation FlucCat PercFlucCat PercentDiff 1BiggestFluc 2BiggestFluc Jockey Trainer Ground GroundCat Favourite Top2Flucs
0 1 15/12/22 Hawkesbury New South Wales 7 Baranof 4.60 3.0 1.60 Minimal Decent 0.347826 NO NO Joshua Parr John Thompson 3 Good 1 0
1 2 15/12/22 Kyneton Victoria 8 Duke Of Neworleans 46.00 20.0 26.00 Huge Big 0.565217 NO YES Jarrod Fry P A Chow 5 Soft 0 1
2 3 15/12/22 Hawkesbury New South Wales 1 Hamaki 2.25 1.8 0.45 Minimal Decent 0.200000 NO YES Joshua Parr P & P Snowden 3 Good 1 1
3 4 16/12/22 Goulburn New South Wales 1 Raffish 2.10 2.6 -0.50 Negative None/Negative -0.238095 NO NO Koby Jennings James Ponsonby 4 Good 1 0
4 5 16/12/22 Goulburn New South Wales 2 Master Joe 3.10 1.7 1.40 Minimal Big 0.451613 NO YES Jeff Penza Scott Collings 4 Good 1 1

Change the 'Favourite' column so that it shows 'YES' as '1' and 'NO' as '0' and convert this column to an integer to be able to use it within the prediction model¶

In [ ]:
data['Favourite'] = data['Favourite'].replace({'YES': 1, 'NO': 0})
data['Favourite'] = data['Favourite'].astype(int)
data.head()

Multiply 'PercentDiff' column by 100 in order to show true percentage of the odds fluctuation for that race, and not a decimal number¶

In [2]:
data['PercentDiff'] = data['PercentDiff']*100
data.head()
Out[2]:
RaceID Date Meet State RaceNo Winner Opening Starting Fluctuation FlucCat PercFlucCat PercentDiff 1BiggestFluc 2BiggestFluc Jockey Trainer Ground GroundCat Favourite Top2Flucs
0 1 15/12/22 Hawkesbury New South Wales 7 Baranof 4.60 3.0 1.60 Minimal Decent 34.782609 NO NO Joshua Parr John Thompson 3 Good 1 0
1 2 15/12/22 Kyneton Victoria 8 Duke Of Neworleans 46.00 20.0 26.00 Huge Big 56.521739 NO YES Jarrod Fry P A Chow 5 Soft 0 1
2 3 15/12/22 Hawkesbury New South Wales 1 Hamaki 2.25 1.8 0.45 Minimal Decent 20.000000 NO YES Joshua Parr P & P Snowden 3 Good 1 1
3 4 16/12/22 Goulburn New South Wales 1 Raffish 2.10 2.6 -0.50 Negative None/Negative -23.809524 NO NO Koby Jennings James Ponsonby 4 Good 1 0
4 5 16/12/22 Goulburn New South Wales 2 Master Joe 3.10 1.7 1.40 Minimal Big 45.161290 NO YES Jeff Penza Scott Collings 4 Good 1 1

Exploratory Data Analysis done below using ydata_profiling library¶

In [3]:
from ydata_profiling import ProfileReport

profile = ProfileReport(data, title = "Profiling Report")
profile.to_notebook_iframe()
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Set 'RaceID' column as index, as this column is only to track the number of races that have been recorded¶

In [4]:
data = data.set_index('RaceID')
data.head()
Out[4]:
Date Meet State RaceNo Winner Opening Starting Fluctuation FlucCat PercFlucCat PercentDiff 1BiggestFluc 2BiggestFluc Jockey Trainer Ground GroundCat Favourite Top2Flucs
RaceID
1 15/12/22 Hawkesbury New South Wales 7 Baranof 4.60 3.0 1.60 Minimal Decent 34.782609 NO NO Joshua Parr John Thompson 3 Good 1 0
2 15/12/22 Kyneton Victoria 8 Duke Of Neworleans 46.00 20.0 26.00 Huge Big 56.521739 NO YES Jarrod Fry P A Chow 5 Soft 0 1
3 15/12/22 Hawkesbury New South Wales 1 Hamaki 2.25 1.8 0.45 Minimal Decent 20.000000 NO YES Joshua Parr P & P Snowden 3 Good 1 1
4 16/12/22 Goulburn New South Wales 1 Raffish 2.10 2.6 -0.50 Negative None/Negative -23.809524 NO NO Koby Jennings James Ponsonby 4 Good 1 0
5 16/12/22 Goulburn New South Wales 2 Master Joe 3.10 1.7 1.40 Minimal Big 45.161290 NO YES Jeff Penza Scott Collings 4 Good 1 1

Produce a Correlation Matrix to see where any relationships may be¶

In [5]:
data.corr()
Out[5]:
RaceNo Opening Starting Fluctuation PercentDiff Ground Favourite Top2Flucs
RaceNo 1.000000 0.124978 0.067250 0.105243 0.123746 -0.019477 0.004295 0.027278
Opening 0.124978 1.000000 0.758559 0.562308 0.198636 0.081970 -0.385600 0.106846
Starting 0.067250 0.758559 1.000000 -0.112285 -0.329886 0.017508 -0.433125 -0.199167
Fluctuation 0.105243 0.562308 -0.112285 1.000000 0.721561 0.102782 -0.038364 0.415693
PercentDiff 0.123746 0.198636 -0.329886 0.721561 1.000000 0.024721 0.250433 0.554551
Ground -0.019477 0.081970 0.017508 0.102782 0.024721 1.000000 -0.072363 -0.046769
Favourite 0.004295 -0.385600 -0.433125 -0.038364 0.250433 -0.072363 1.000000 0.249757
Top2Flucs 0.027278 0.106846 -0.199167 0.415693 0.554551 -0.046769 0.249757 1.000000

Create a Logistic Regression model in order to predict whether a horse will be in the Top 2 Fluctuations, based on 'Opening' (opening price), 'Starting' (starting price), 'PercentDiff' (Percentage difference between Opening and Starting), 'Ground' (Ground type - related to how wet the track is - the higher the number the more likely horses are to be pulled out of the race, thus lowering the remaining horses odds'), and 'Favourite' (whether the horse was the favourite in the race).¶

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

train_data, test_data, train_labels, test_labels = train_test_split(data[["Opening","Starting","PercentDiff","Ground","Favourite"]], data["Top2Flucs"], test_size=0.2)

logreg_model = LogisticRegression()
logreg_model.fit(train_data, train_labels)

predictions = logreg_model.predict(test_data)

accuracy = accuracy_score(test_labels, predictions)
print("Accuracy:", accuracy)

conf_matrix = confusion_matrix(test_labels, predictions)
print("Confusion matrix:")
print(conf_matrix)
Accuracy: 0.8148148148148148
Confusion matrix:
[[86 13]
 [17 46]]

Create a new column called 'Prediction' based on the prediction of the model for each observation. Import numpy library in order to create a True/False boolean result, based on whether the prediction is the same as the outcome. Rename the True/False values to 'Correct' or 'Incorrect' to increase readability of the models prediction.¶

In [10]:
import numpy as np

data['Prediction'] = logreg_model.predict(data[['Opening', 'Starting','PercentDiff','Ground','Favourite']])
correct_mask = data['Prediction'] == data['Top2Flucs']
data['Prediction'] = np.where(correct_mask, 'Correct', 'Incorrect')
data.head(50)
Out[10]:
RaceID Date Meet State RaceNo Winner Opening Starting Fluctuation FlucCat ... PercentDiff 1BiggestFluc 2BiggestFluc Jockey Trainer Ground GroundCat Favourite Top2Flucs Prediction
0 1 15/12/22 Hawkesbury New South Wales 7 Baranof 4.60 3.00 1.60 Minimal ... 0.347826 NO NO Joshua Parr John Thompson 3 Good 1 0 Incorrect
1 2 15/12/22 Kyneton Victoria 8 Duke Of Neworleans 46.00 20.00 26.00 Huge ... 0.565217 NO YES Jarrod Fry P A Chow 5 Soft 0 1 Correct
2 3 15/12/22 Hawkesbury New South Wales 1 Hamaki 2.25 1.80 0.45 Minimal ... 0.200000 NO YES Joshua Parr P & P Snowden 3 Good 1 1 Correct
3 4 16/12/22 Goulburn New South Wales 1 Raffish 2.10 2.60 -0.50 Negative ... -0.238095 NO NO Koby Jennings James Ponsonby 4 Good 1 0 Correct
4 5 16/12/22 Goulburn New South Wales 2 Master Joe 3.10 1.70 1.40 Minimal ... 0.451613 NO YES Jeff Penza Scott Collings 4 Good 1 1 Correct
5 6 16/12/22 Goulburn New South Wales 3 Mo More Chicken 7.50 9.00 -1.50 Negative ... -0.200000 NO NO Jeff Penza N J Osborne 4 Good 0 0 Correct
6 7 16/12/22 Goulburn New South Wales 4 Nieces and Nephews 3.50 3.70 -0.20 Negative ... -0.057143 NO NO Ryan Bradley B Joseph & P & M Jones 4 Good 0 0 Correct
7 8 16/12/22 Goulburn New South Wales 5 Jasiri 5.00 3.70 1.30 Minimal ... 0.260000 NO NO Koby Jennings M & W & J Hawkes 4 Good 1 0 Incorrect
8 9 16/12/22 Goulburn New South Wales 6 Twig 3.90 5.00 -1.10 Negative ... -0.282051 NO NO Amy Mclucas Matthew Dale 4 Good 0 0 Correct
9 10 16/12/22 Goulburn New South Wales 7 Semana 2.30 1.75 0.55 Minimal ... 0.239130 NO YES Jeff Penza C Maher & D Eustace 4 Good 1 1 Correct
10 11 16/12/22 Geelong Victoria 1 Doublern 3.70 3.10 0.60 Minimal ... 0.162162 NO NO Joe Bowditch Mark & L Kavanagh 4 Good 1 0 Correct
11 12 16/12/22 Geelong Victoria 2 Tolpuddle 4.00 1.80 2.20 Small ... 0.550000 YES NO Blaike Mcdougall G M Begg 4 Good 1 1 Correct
12 13 16/12/22 Geelong Victoria 3 Belthil 2.10 1.45 0.65 Minimal ... 0.309524 YES NO Harry Coffey Andrew Bobbin 4 Good 1 1 Correct
13 14 16/12/22 Geelong Victoria 4 Aramco 10.00 4.50 5.50 Big ... 0.550000 YES NO Zac Spain M Price & M K Jnr 4 Good 0 1 Correct
14 15 16/12/22 Geelong Victoria 5 Resolutions 5.00 4.80 0.20 Minimal ... 0.040000 NO NO Jarrod Fry T W Mulder 4 Good 0 0 Correct
15 16 16/12/22 Geelong Victoria 6 Dyerville 4.20 4.00 0.20 Minimal ... 0.047619 NO NO Dean Yendall M J Williams 4 Good 0 0 Correct
16 17 16/12/22 Geelong Victoria 7 Bella Babe 4.00 11.00 -7.00 Negative ... -1.750000 NO NO M Chadwick C Maher & D Eustace 4 Good 0 0 Correct
17 18 16/12/22 Geelong Victoria 8 Capital Express 2.70 1.75 0.95 Minimal ... 0.351852 NO YES Damien Oliver Nick Ryan 4 Good 1 1 Correct
18 19 16/12/22 Ballina New South Wales 1 Chickerartie 4.20 3.20 1.00 Minimal ... 0.238095 YES NO Les Tilley S B Lee 3 Good 1 1 Correct
19 20 16/12/22 Ballina New South Wales 2 Palawa Kani 4.20 4.80 -0.60 Negative ... -0.142857 NO NO Matthew Mcguren Daniel Bowen 3 Good 0 0 Correct
20 21 16/12/22 Ballina New South Wales 3 Seaczar 7.50 14.00 -6.50 Negative ... -0.866667 NO YES Danny Peisley C R Manson 3 Good 0 1 Incorrect
21 22 16/12/22 Ballina New South Wales 4 I Shot The Sheriff 8.00 6.50 1.50 Minimal ... 0.187500 NO NO John Grisedale B F Cavanough 3 Good 0 0 Correct
22 23 16/12/22 Ballina New South Wales 5 Centre Bounce 2.50 1.85 0.65 Minimal ... 0.260000 NO YES Matthew Mcguren M J Dunn 3 Good 1 1 Correct
23 24 16/12/22 Ballina New South Wales 6 Little Vista 11.00 6.50 4.50 Small ... 0.409091 NO YES Morgan Butler W Bannerot 3 Good 0 1 Correct
24 25 16/12/22 Ballina New South Wales 7 Starter 9.50 12.00 -2.50 Negative ... -0.263158 NO NO Luke Dittman Allan Chau 3 Good 0 0 Correct
25 26 16/12/22 Ballina New South Wales 8 The Tyler 7.00 3.60 3.40 Small ... 0.485714 NO YES Ben Looker L J Hatch 3 Good 0 1 Correct
26 27 16/12/22 Warren New South Wales 1 Jade Division 7.00 11.00 -4.00 Negative ... -0.571429 NO NO S Ingelse Brett Thompson 5 Soft 0 0 Correct
27 28 16/12/22 Warren New South Wales 2 Listen To The Band 5.00 2.80 2.20 Small ... 0.440000 NO NO J Pracey-Holmes C Lundholm 5 Soft 1 0 Incorrect
28 29 16/12/22 Warren New South Wales 3 No Debt 7.50 5.00 2.50 Small ... 0.333333 NO NO Jake Barrett Brett Robb 5 Soft 0 0 Correct
29 30 16/12/22 Warren New South Wales 4 Allchosen 8.00 7.00 1.00 Minimal ... 0.125000 NO NO Brooke Stower G D Lunn 5 Soft 0 0 Correct
30 31 16/12/22 Warren New South Wales 5 Not Negotiating 13.00 11.00 2.00 Minimal ... 0.153846 NO NO A Stanley Peter W Stanley 5 Soft 0 0 Correct
31 32 16/12/22 Warren New South Wales 6 Planet Ex 5.50 3.20 2.30 Small ... 0.418182 YES NO Clayton Gallagher W Collison 5 Soft 0 1 Correct
32 33 16/12/22 Warren New South Wales 7 Lady Lucilla 10.00 9.00 1.00 Minimal ... 0.100000 NO NO J Pracey-Holmes C Lundholm 5 Soft 0 0 Correct
33 34 16/12/22 Rockhampton Queensland 1 Kings County 15.00 17.00 -2.00 Negative ... -0.133333 NO NO Chris Whiteley Adrian Coome 5 Soft 0 0 Correct
34 35 16/12/22 Rockhampton Queensland 2 Mocial Chief 3.40 2.10 1.30 Minimal ... 0.382353 NO YES Ashley Butler K N Smyth 5 Soft 1 1 Correct
35 36 16/12/22 Rockhampton Queensland 3 Highground 2.50 1.85 0.65 Minimal ... 0.260000 NO NO Jason Taylor Nick Walsh 5 Soft 1 0 Correct
36 37 16/12/22 Rockhampton Queensland 4 Falsetto 2.40 1.40 1.00 Minimal ... 0.416667 YES NO Justin Stanley Clinton Taylor 5 Soft 1 1 Correct
37 38 16/12/22 Rockhampton Queensland 5 Under The Limit 4.60 2.50 2.10 Small ... 0.456522 NO YES Ashley Butler R Tyrell & T Button 5 Soft 1 1 Correct
38 39 16/12/22 Rockhampton Queensland 6 Art By Concorde 6.00 5.50 0.50 Minimal ... 0.083333 NO NO Adam Sewell C Smith 5 Soft 0 0 Correct
39 40 16/12/22 Rockhampton Queensland 7 I Promise You 5.00 2.60 2.40 Small ... 0.480000 YES NO Isabella Rabjones William Kropp 5 Soft 1 1 Correct
40 41 16/12/22 Rockhampton Queensland 8 Divine Purpose 2.45 1.70 0.75 Minimal ... 0.306122 NO YES Justin Stanley Clinton Taylor 5 Soft 1 1 Correct
41 42 17/12/22 Flemington Victoria 1 She Dances 10.00 10.00 0.00 No Change ... 0.000000 NO NO Luke Nolen P G Moody 3 Good 0 0 Correct
42 43 17/12/22 Flemington Victoria 2 For Real Life 7.00 9.00 -2.00 Negative ... -0.285714 NO NO Blaike Mcdougall L & T Corstens 3 Good 0 0 Correct
43 44 17/12/22 Flemington Victoria 3 Pounding 3.40 3.10 0.30 Minimal ... 0.088235 NO NO Jamie Kah P G Moody 3 Good 1 0 Correct
44 45 17/12/22 Flemington Victoria 4 Hasseltoff 12.00 9.00 3.00 Small ... 0.250000 NO NO Craig Williams Tom Dabernig 3 Good 0 0 Correct
45 46 17/12/22 Flemington Victoria 5 Invincible Caviar 4.00 3.00 1.00 Minimal ... 0.250000 NO YES Jamie Kah P G Moody 3 Good 1 1 Correct
46 47 17/12/22 Flemington Victoria 6 Persan 3.10 2.60 0.50 Minimal ... 0.161290 NO YES Harry Coffey C Maher & D Eustace 3 Good 1 1 Incorrect
47 48 17/12/22 Flemington Victoria 7 Ashford Street 20.00 10.00 10.00 Big ... 0.500000 YES NO Teo Nugent K M Elford 3 Good 0 1 Correct
48 49 17/12/22 Flemington Victoria 8 Nicolini Vito 5.50 7.50 -2.00 Negative ... -0.363636 NO NO Damien Oliver Ben & J D Hayes 3 Good 0 0 Correct
49 50 17/12/22 Flemington Victoria 9 He's Xceptional 7.50 9.00 -1.50 Negative ... -0.200000 NO NO Thomas Stockdale T Busuttin & N Young 3 Good 0 0 Correct

50 rows × 21 columns

Perform cross-validation on the data, as I don't have a secondary dataset to be able to test this model on. Cross-validating with 10 folds will split the data into 10 groups, using 1 as the train set, and the remaining 9 as the test set. It will use each group once, and test it on the remaining 9 - thus evaluating the model 10 times. I have then printed out the mean accuracy over these 10 iterations of the model.¶

In [12]:
from sklearn.model_selection import cross_val_score
train_data, test_data, train_labels, test_labels = train_test_split(data[["Opening","Starting","PercentDiff","Ground","Favourite"]], data["Top2Flucs"], test_size=0.2)

logreg_model = LogisticRegression()

scores = cross_val_score(logreg_model, train_data, train_labels, cv=10)
mean_score = scores.mean()

conf_matrix = confusion_matrix(test_labels, predictions)
print("Mean accuracy score: ", mean_score)
print("Confusion matrix:")
print(conf_matrix)
Mean accuracy score:  0.8244471153846155
Confusion matrix:
[[62 36]
 [41 23]]
In [13]:
data.head(50)
Out[13]:
RaceID Date Meet State RaceNo Winner Opening Starting Fluctuation FlucCat ... PercentDiff 1BiggestFluc 2BiggestFluc Jockey Trainer Ground GroundCat Favourite Top2Flucs Prediction
0 1 15/12/22 Hawkesbury New South Wales 7 Baranof 4.60 3.00 1.60 Minimal ... 0.347826 NO NO Joshua Parr John Thompson 3 Good 1 0 Incorrect
1 2 15/12/22 Kyneton Victoria 8 Duke Of Neworleans 46.00 20.00 26.00 Huge ... 0.565217 NO YES Jarrod Fry P A Chow 5 Soft 0 1 Correct
2 3 15/12/22 Hawkesbury New South Wales 1 Hamaki 2.25 1.80 0.45 Minimal ... 0.200000 NO YES Joshua Parr P & P Snowden 3 Good 1 1 Correct
3 4 16/12/22 Goulburn New South Wales 1 Raffish 2.10 2.60 -0.50 Negative ... -0.238095 NO NO Koby Jennings James Ponsonby 4 Good 1 0 Correct
4 5 16/12/22 Goulburn New South Wales 2 Master Joe 3.10 1.70 1.40 Minimal ... 0.451613 NO YES Jeff Penza Scott Collings 4 Good 1 1 Correct
5 6 16/12/22 Goulburn New South Wales 3 Mo More Chicken 7.50 9.00 -1.50 Negative ... -0.200000 NO NO Jeff Penza N J Osborne 4 Good 0 0 Correct
6 7 16/12/22 Goulburn New South Wales 4 Nieces and Nephews 3.50 3.70 -0.20 Negative ... -0.057143 NO NO Ryan Bradley B Joseph & P & M Jones 4 Good 0 0 Correct
7 8 16/12/22 Goulburn New South Wales 5 Jasiri 5.00 3.70 1.30 Minimal ... 0.260000 NO NO Koby Jennings M & W & J Hawkes 4 Good 1 0 Incorrect
8 9 16/12/22 Goulburn New South Wales 6 Twig 3.90 5.00 -1.10 Negative ... -0.282051 NO NO Amy Mclucas Matthew Dale 4 Good 0 0 Correct
9 10 16/12/22 Goulburn New South Wales 7 Semana 2.30 1.75 0.55 Minimal ... 0.239130 NO YES Jeff Penza C Maher & D Eustace 4 Good 1 1 Correct
10 11 16/12/22 Geelong Victoria 1 Doublern 3.70 3.10 0.60 Minimal ... 0.162162 NO NO Joe Bowditch Mark & L Kavanagh 4 Good 1 0 Correct
11 12 16/12/22 Geelong Victoria 2 Tolpuddle 4.00 1.80 2.20 Small ... 0.550000 YES NO Blaike Mcdougall G M Begg 4 Good 1 1 Correct
12 13 16/12/22 Geelong Victoria 3 Belthil 2.10 1.45 0.65 Minimal ... 0.309524 YES NO Harry Coffey Andrew Bobbin 4 Good 1 1 Correct
13 14 16/12/22 Geelong Victoria 4 Aramco 10.00 4.50 5.50 Big ... 0.550000 YES NO Zac Spain M Price & M K Jnr 4 Good 0 1 Correct
14 15 16/12/22 Geelong Victoria 5 Resolutions 5.00 4.80 0.20 Minimal ... 0.040000 NO NO Jarrod Fry T W Mulder 4 Good 0 0 Correct
15 16 16/12/22 Geelong Victoria 6 Dyerville 4.20 4.00 0.20 Minimal ... 0.047619 NO NO Dean Yendall M J Williams 4 Good 0 0 Correct
16 17 16/12/22 Geelong Victoria 7 Bella Babe 4.00 11.00 -7.00 Negative ... -1.750000 NO NO M Chadwick C Maher & D Eustace 4 Good 0 0 Correct
17 18 16/12/22 Geelong Victoria 8 Capital Express 2.70 1.75 0.95 Minimal ... 0.351852 NO YES Damien Oliver Nick Ryan 4 Good 1 1 Correct
18 19 16/12/22 Ballina New South Wales 1 Chickerartie 4.20 3.20 1.00 Minimal ... 0.238095 YES NO Les Tilley S B Lee 3 Good 1 1 Correct
19 20 16/12/22 Ballina New South Wales 2 Palawa Kani 4.20 4.80 -0.60 Negative ... -0.142857 NO NO Matthew Mcguren Daniel Bowen 3 Good 0 0 Correct
20 21 16/12/22 Ballina New South Wales 3 Seaczar 7.50 14.00 -6.50 Negative ... -0.866667 NO YES Danny Peisley C R Manson 3 Good 0 1 Incorrect
21 22 16/12/22 Ballina New South Wales 4 I Shot The Sheriff 8.00 6.50 1.50 Minimal ... 0.187500 NO NO John Grisedale B F Cavanough 3 Good 0 0 Correct
22 23 16/12/22 Ballina New South Wales 5 Centre Bounce 2.50 1.85 0.65 Minimal ... 0.260000 NO YES Matthew Mcguren M J Dunn 3 Good 1 1 Correct
23 24 16/12/22 Ballina New South Wales 6 Little Vista 11.00 6.50 4.50 Small ... 0.409091 NO YES Morgan Butler W Bannerot 3 Good 0 1 Correct
24 25 16/12/22 Ballina New South Wales 7 Starter 9.50 12.00 -2.50 Negative ... -0.263158 NO NO Luke Dittman Allan Chau 3 Good 0 0 Correct
25 26 16/12/22 Ballina New South Wales 8 The Tyler 7.00 3.60 3.40 Small ... 0.485714 NO YES Ben Looker L J Hatch 3 Good 0 1 Correct
26 27 16/12/22 Warren New South Wales 1 Jade Division 7.00 11.00 -4.00 Negative ... -0.571429 NO NO S Ingelse Brett Thompson 5 Soft 0 0 Correct
27 28 16/12/22 Warren New South Wales 2 Listen To The Band 5.00 2.80 2.20 Small ... 0.440000 NO NO J Pracey-Holmes C Lundholm 5 Soft 1 0 Incorrect
28 29 16/12/22 Warren New South Wales 3 No Debt 7.50 5.00 2.50 Small ... 0.333333 NO NO Jake Barrett Brett Robb 5 Soft 0 0 Correct
29 30 16/12/22 Warren New South Wales 4 Allchosen 8.00 7.00 1.00 Minimal ... 0.125000 NO NO Brooke Stower G D Lunn 5 Soft 0 0 Correct
30 31 16/12/22 Warren New South Wales 5 Not Negotiating 13.00 11.00 2.00 Minimal ... 0.153846 NO NO A Stanley Peter W Stanley 5 Soft 0 0 Correct
31 32 16/12/22 Warren New South Wales 6 Planet Ex 5.50 3.20 2.30 Small ... 0.418182 YES NO Clayton Gallagher W Collison 5 Soft 0 1 Correct
32 33 16/12/22 Warren New South Wales 7 Lady Lucilla 10.00 9.00 1.00 Minimal ... 0.100000 NO NO J Pracey-Holmes C Lundholm 5 Soft 0 0 Correct
33 34 16/12/22 Rockhampton Queensland 1 Kings County 15.00 17.00 -2.00 Negative ... -0.133333 NO NO Chris Whiteley Adrian Coome 5 Soft 0 0 Correct
34 35 16/12/22 Rockhampton Queensland 2 Mocial Chief 3.40 2.10 1.30 Minimal ... 0.382353 NO YES Ashley Butler K N Smyth 5 Soft 1 1 Correct
35 36 16/12/22 Rockhampton Queensland 3 Highground 2.50 1.85 0.65 Minimal ... 0.260000 NO NO Jason Taylor Nick Walsh 5 Soft 1 0 Correct
36 37 16/12/22 Rockhampton Queensland 4 Falsetto 2.40 1.40 1.00 Minimal ... 0.416667 YES NO Justin Stanley Clinton Taylor 5 Soft 1 1 Correct
37 38 16/12/22 Rockhampton Queensland 5 Under The Limit 4.60 2.50 2.10 Small ... 0.456522 NO YES Ashley Butler R Tyrell & T Button 5 Soft 1 1 Correct
38 39 16/12/22 Rockhampton Queensland 6 Art By Concorde 6.00 5.50 0.50 Minimal ... 0.083333 NO NO Adam Sewell C Smith 5 Soft 0 0 Correct
39 40 16/12/22 Rockhampton Queensland 7 I Promise You 5.00 2.60 2.40 Small ... 0.480000 YES NO Isabella Rabjones William Kropp 5 Soft 1 1 Correct
40 41 16/12/22 Rockhampton Queensland 8 Divine Purpose 2.45 1.70 0.75 Minimal ... 0.306122 NO YES Justin Stanley Clinton Taylor 5 Soft 1 1 Correct
41 42 17/12/22 Flemington Victoria 1 She Dances 10.00 10.00 0.00 No Change ... 0.000000 NO NO Luke Nolen P G Moody 3 Good 0 0 Correct
42 43 17/12/22 Flemington Victoria 2 For Real Life 7.00 9.00 -2.00 Negative ... -0.285714 NO NO Blaike Mcdougall L & T Corstens 3 Good 0 0 Correct
43 44 17/12/22 Flemington Victoria 3 Pounding 3.40 3.10 0.30 Minimal ... 0.088235 NO NO Jamie Kah P G Moody 3 Good 1 0 Correct
44 45 17/12/22 Flemington Victoria 4 Hasseltoff 12.00 9.00 3.00 Small ... 0.250000 NO NO Craig Williams Tom Dabernig 3 Good 0 0 Correct
45 46 17/12/22 Flemington Victoria 5 Invincible Caviar 4.00 3.00 1.00 Minimal ... 0.250000 NO YES Jamie Kah P G Moody 3 Good 1 1 Correct
46 47 17/12/22 Flemington Victoria 6 Persan 3.10 2.60 0.50 Minimal ... 0.161290 NO YES Harry Coffey C Maher & D Eustace 3 Good 1 1 Incorrect
47 48 17/12/22 Flemington Victoria 7 Ashford Street 20.00 10.00 10.00 Big ... 0.500000 YES NO Teo Nugent K M Elford 3 Good 0 1 Correct
48 49 17/12/22 Flemington Victoria 8 Nicolini Vito 5.50 7.50 -2.00 Negative ... -0.363636 NO NO Damien Oliver Ben & J D Hayes 3 Good 0 0 Correct
49 50 17/12/22 Flemington Victoria 9 He's Xceptional 7.50 9.00 -1.50 Negative ... -0.200000 NO NO Thomas Stockdale T Busuttin & N Young 3 Good 0 0 Correct

50 rows × 21 columns

Test the Precision, Recall and F1 score of the Logistic Regression model. If this is for the purposes of wanting a low proportion of false positives, then we would like to see a high value for recall.¶

In [15]:
from sklearn.metrics import precision_score, recall_score, f1_score
logreg_model.fit(train_data, train_labels)
test_pred_labels = logreg_model.predict(test_data)
conf_matrix = confusion_matrix(test_labels, test_pred_labels)

precision = precision_score(test_labels, test_pred_labels)
recall = recall_score(test_labels, test_pred_labels)
f1 = f1_score(test_labels, test_pred_labels)

print("Confusion matrix:")
print(conf_matrix)
print("Precision:", precision)
print("Recall:", recall)
print("F1 score:", f1)
Confusion matrix:
[[84 14]
 [16 48]]
Precision: 0.7741935483870968
Recall: 0.75
F1 score: 0.7619047619047619
In [ ]: